Safe option-critic: learning safety in the option-critic architecture
نویسندگان
چکیده
منابع مشابه
The Option-Critic Architecture
Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...
متن کاملA critic-critic architecture to combine reinforcement and supervised learnings
In real life, learning is greatly speeded-up by the intervention of a teacher who gives examples, or shows, how to perform a certain task. In all this abstract, we let apart structural simpli cations of the problem by the designer which to not deal explicitely with learning. The intervention of the teacher can be realized in di erent ways: verbal explanation, demonstration, guidance, shaping th...
متن کاملHumanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture
The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-base...
متن کاملActor-Critic Policy Learning in Cooperative Planning
In this paper, we introduce a method for learning and adapting cooperative control strategies in real-time stochastic domains. Our framework is an instance of the intelligent cooperative control architecture (iCCA). The agent starts by following the “safe” plan calculated by the planning module and incrementally adapting its policy to maximize the cumulative rewards. Actor-critic and consensusb...
متن کاملThe Eigenoption-Critic Framework
Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration Machado et al. [2017a]. Despite its first initial promising results, a couple of issues in current algorithms limit its application, namely: 1) EO methods require two separate steps (eigenoption discovery and r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Knowledge Engineering Review
سال: 2021
ISSN: 0269-8889,1469-8005
DOI: 10.1017/s0269888921000035